Skip to content

[CANN]Support OP MUL_MAT_ID #13042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025
Merged

Conversation

noemotiovon
Copy link
Contributor

@noemotiovon noemotiovon commented Apr 21, 2025

Why is this PR needed?

Add support for the MUL_MAT_ID operator required by MOE models.

Op Test

Backend 1/2: CANN0
ggml_backend_cann_context: device 0 async operator submission is OFF
  Device description: Ascend910B3
  Device memory: 62432 MB (62145 MB free)

  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): new_pool_for_device: device 0 use vmm pool
OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F32,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=1,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=F16,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=129,k=256): OK
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_0,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=1,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=2,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=0,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=8,n_used=4,b=1,m=512,n=129,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q4_1,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q5_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q5_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q5_1,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q5_1,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=Q8_0,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): not supported [CANN0] 
  MUL_MAT_ID(type_a=UNKNOWN,type_b=F32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): not supported [CANN0] 
  5473/5473 tests passed
  Backend CANN0: OK

Backend 2/2: CPU
  Skipping
2/2 backends passed
OK

TODO

The current matrix multiplication in MOE is quite slow. I’ll keep investing effort into this and look for a suitable aclnn acceleration operator as a replacement.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 21, 2025
@hipudding hipudding added the Ascend NPU issues specific to Ascend NPUs label Apr 27, 2025
@hipudding hipudding self-requested a review April 27, 2025 01:42
@noemotiovon
Copy link
Contributor Author

@hipudding ,
I’ve updated the code based on the review comments and switched to using aclnn's GroupedMatmul. Thank you very much for reviewing my code!

@noemotiovon
Copy link
Contributor Author

Model: granite3-moe:3b-instruct-fp16
Test:

llama_perf_sampler_print:    sampling time =      79.27 ms /   132 runs   (    0.60 ms per token,  1665.30 tokens per second)
llama_perf_context_print:        load time =    4261.29 ms
llama_perf_context_print: prompt eval time =     242.70 ms /    20 tokens (   12.14 ms per token,    82.41 tokens per second)
llama_perf_context_print:        eval time =    7069.77 ms /   111 runs   (   63.69 ms per token,    15.70 tokens per second)
llama_perf_context_print:       total time =    8205.13 ms /   131 tokens

@hipudding hipudding merged commit 33d7aed into ggml-org:master May 19, 2025
45 checks passed
infil00p pushed a commit to baseweight/llama.cpp that referenced this pull request May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants